ENVX2001 Applied Statistical Methods
The University of Sydney
Mar 2025
Each unit has an equal chance of being selected.
Each unit has an equal chance of being selected.
Imagine tossing 10 random points onto a landscape.
If an area has:
With simple random sampling:
If studying plant biodiversity in a national park:
Once we have our stratified sample, we need to:
All of these steps must account for our stratified design.
Soil carbon content was measured at 7 locations across the area. The amounts were: 48, 56, 90, 78, 86, 71, 42 tonnes per hectare (t/ha).
The pooled mean is our best estimate of the overall population mean, taking into account the different stratum sizes.
\bar{y}_{s} = \sum_{i=1}^L \bar{y}_i \times w_i
In simple terms:
We first define the weights w_i for each stratum based on their area:
Then we calculate the weighted mean:
This is like saying: “62% of our land has soil carbon like land type A, and 38% has soil carbon like land type B, so our overall estimate takes both into account in these proportions.”
SE(\bar y_{s}) = \sqrt{\color{blue}{{\sum_{i=1}^L w_i^2}} \times \frac{s_i^2}{n_i}}
What’s different?
df = n - L
where n is the total number of samples and L is the number of strata.
95\%\ CI = \bar y_{s} \pm t^{0.025}_{n-L} \times SE(\bar y_{s})
where L is the number of strata, n is the total number of samples, and \bar y_{s} is the weighted mean of the strata.
In simple terms:
Lower bound ← [Pooled mean - Margin of error] ... [Pooled mean + Margin of error] → Upper bound
varA <- var(landA) / length(landA) # variance of the mean for A
varB <- var(landB) / length(landB) # variance of the mean for B
weighted_var <- weight[1]^2 * varA + weight[2]^2 * varB
weighted_se <- sqrt(weighted_var)
ci <- c(
L95 = weighted_mean - t_crit * weighted_se,
u95 = weighted_mean + t_crit * weighted_se
)
ci L95 u95
61.04864 76.68803
What if we had used stratified random sampling instead of simple random sampling (and collected the same amount of data)?
| Design | Mean | Var (mean) | L95 | U95 | df |
|---|---|---|---|---|---|
| Simple Random | 67.29 | 50.83 | 49.85 | 84.73 | 6 |
| Stratified Random | 68.90 | 9.30 | 61.00 | 76.70 | 5 |
# Creating a visual comparison of confidence intervals
ggplot(compare, aes(x = Design, y = Mean)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = L95, ymax = U95), width = 0.2, size = 1) +
labs(title = "95% Confidence Intervals by Sampling Design",
y = "Soil Carbon (tonnes/ha)",
x = "") +
theme_minimal(base_size = 14) +
annotate("text", x = 2, y = 55,
label = "Stratified sampling gives a\nnarrower confidence interval\n(more precise estimate)",
color = "blue")How many samples would we have had to collect using simple random sampling to achieve the same precision as our stratified sample?
So we would need about 38 samples with simple random sampling to get the same precision that we achieved with just 7 samples using stratified sampling!
What if we come back and do another set of soil carbon measurements?
The difference between the means of the two sets of measurements.
\Delta \bar y = \bar y_2 - \bar y_1
where \bar y_2 and \bar y_1 are the means of the second and first set of measurements, respectively.
This tells us how precise our estimate of the change is. It depends on:
Var(\Delta{\bar y}) = Var(\bar y_2) + Var(\bar y_1) - 2 \times Cov(\bar y_2, \bar y_1)
In simple terms:
Important: Visiting the same sites twice (paired sampling) usually gives more precise estimates of change than visiting different sites each time!
Covariance measures how two measurements relate to each other:
Example with soil carbon:
What do you notice? Sites with high carbon in the first measurement still have high carbon in the second measurement (positive covariance).
Why this matters: Knowing the first measurement helps us predict the second one, reducing uncertainty in our estimate of change.
Practical takeaway: When measuring change over time, returning to the same sites usually gives more precise results because it removes site-to-site variation.
95\%\ CI = \Delta \bar y \pm t^{0.025}_{n-1} \times SE(\Delta \bar y)
Good news! You don’t need to calculate this by hand!
t.test() functionpaired = TRUE optionpaired = FALSE optionThis presentation is based on the SOLES Quarto reveal.js template and is licensed under a Creative Commons Attribution 4.0 International License.